(optional) Using HuggingFace API for the worker

Another option for watsonX is using HuggingFace API. However, the model type and performance are very limited for the free version. As of Sep 2023, Llama2 is not supported for free, and you will use the falcon7b model instead.

Install the specified versions of LangChain and HuggingFace Hub, copy and paste the following commands into your terminal:

  1. 1
  2. 2
  1. pip install langchain==0.1.17
  2. pip install huggingface-hub==0.23.4

You need to update init_llm and insert your Hugging Face API key.

  1. 1
  1. * The `HuggingFace` is hosting some LLM models which can be called free (if the model is below 10GB). You can also download any model and run it locally.

The HuggingFaceHub object is created with the specified repo_id and additional parameters like temperature, max_new_tokens, and max_length to control the behavior of the model. Here you can find more examples.

  • The embeddings are initialized using a class called HuggingFaceInstructEmbeddings, pre-trained model named sentence-transformers/all-MiniLM-L6-v2, and a list of leaderboards of embeddings are available here. This embedding model has shown a good balance in both performance and speed.

  • The model uses the specified device (CPU or GPU) for computation.

To do: Complete the function init_llm()

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  1. def init_llm():
  2. global llm_hub, embeddings
  3. # Set up the environment variable for HuggingFace and initialize the desired model.
  4. os.environ["HUGGINGFACEHUB_API_TOKEN"] = "Your HuggingFace API"
  5. # Insert the name of repo model
  6. model_id = "tiiuae/falcon-7b-instruct"
  7. # load the model into the HuggingFaceHub
  8. llm_hub = # --> specify hugging face hub object with (repo_id, model_kwargs={"temperature": 0.1, "max_new_tokens": 600, "max_length": 600})
  9. #Initialize embeddings using a pre-trained model to represent the text data.
  10. embeddings = # --> create object of Hugging Face Instruct Embeddings with (model_name, model_kwargs={"device": DEVICE} )
Click here to see the solution
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  1. # Function to initialize the language model and its embeddings
  2. def init_llm():
  3. global llm_hub, embeddings
  4. # Set up the environment variable for HuggingFace and initialize the desired model.
  5. os.environ["HUGGINGFACEHUB_API_TOKEN"] = "YOUR API KEY"
  6. # repo name for the model
  7. model_id = "tiiuae/falcon-7b-instruct"
  8. # load the model into the HuggingFaceHub
  9. llm_hub = HuggingFaceHub(repo_id=model_id, model_kwargs={"temperature": 0.1, "max_new_tokens": 600, "max_length": 600})
  10. #Initialize embeddings using a pre-trained model to represent the text data.
  11. embeddings = HuggingFaceInstructEmbeddings(
  12. model_name="sentence-transformers/all-MiniLM-L6-v2", model_kwargs={"device": DEVICE}
  13. )

You also need to insert your LLM API key. Here's a demonstration to show you how to get your API key.

Initialize HuggingFace API key from your account with the following steps:

  1. Go to the https://huggingface.co/
  2. Log in to your account (or sign up free if it is your first time)
  3. Go to Settings -> Access Tokens -> click on New Token (refer image below)
  4. Select either read or write option and copy the token
HuggingFace Token
Click here to see the complete worker.py for huggingface version
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
  75. 75
  76. 76
  77. 77
  78. 78
  79. 79
  80. 80
  81. 81
  82. 82
  83. 83
  84. 84
  85. 85
  86. 86
  1. import os
  2. import torch
  3. from langchain import PromptTemplate
  4. from langchain.chains import RetrievalQA
  5. from langchain.embeddings import HuggingFaceInstructEmbeddings
  6. from langchain.document_loaders import PyPDFLoader
  7. from langchain.text_splitter import RecursiveCharacterTextSplitter
  8. from langchain.vectorstores import Chroma
  9. from langchain.llms import HuggingFaceHub
  10. # Check for GPU availability and set the appropriate device for computation.
  11. DEVICE = "cuda:0" if torch.cuda.is_available() else "cpu"
  12. # Global variables
  13. conversation_retrieval_chain = None
  14. chat_history = []
  15. llm_hub = None
  16. embeddings = None
  17. # Function to initialize the language model and its embeddings
  18. def init_llm():
  19. global llm_hub, embeddings
  20. # Set up the environment variable for HuggingFace and initialize the desired model.
  21. os.environ["HUGGINGFACEHUB_API_TOKEN"] = "YOUR API KEY"
  22. # repo name for the model
  23. model_id = "tiiuae/falcon-7b-instruct"
  24. # load the model into the HuggingFaceHub
  25. llm_hub = HuggingFaceHub(repo_id=model_id, model_kwargs={"temperature": 0.1, "max_new_tokens": 600, "max_length": 600})
  26. #Initialize embeddings using a pre-trained model to represent the text data.
  27. embeddings = HuggingFaceInstructEmbeddings(
  28. model_name="sentence-transformers/all-MiniLM-L6-v2", model_kwargs={"device": DEVICE}
  29. )
  30. # Function to process a PDF document
  31. def process_document(document_path):
  32. global conversation_retrieval_chain
  33. # Load the document
  34. loader = PyPDFLoader(document_path)
  35. documents = loader.load()
  36. # Split the document into chunks
  37. text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=64)
  38. texts = text_splitter.split_documents(documents)
  39. # Create an embeddings database using Chroma from the split text chunks.
  40. db = Chroma.from_documents(texts, embedding=embeddings)
  41. # --> Build the QA chain, which utilizes the LLM and retriever for answering questions.
  42. # By default, the vectorstore retriever uses similarity search.
  43. # You can also specify search kwargs like k to use when doing retrieval. k represents how many search results are sent to llm
  44. conversation_retrieval_chain = RetrievalQA.from_chain_type(
  45. llm=llm_hub,
  46. chain_type="stuff",
  47. retriever=db.as_retriever(search_kwargs={'k': 3}),
  48. return_source_documents=False,
  49. input_key = "question"
  50. # chain_type_kwargs={"prompt": prompt} # if you are using prompt template, you need to uncomment this part
  51. )
  52. # Function to process a user prompt
  53. def process_prompt(prompt):
  54. global conversation_retrieval_chain
  55. global chat_history
  56. # Query the model
  57. output = conversation_retrieval_chain.invoke({"question": prompt, "chat_history": chat_history})
  58. answer = output["result"]
  59. # Update the chat history
  60. chat_history.append((prompt, answer))
  61. if "Helpful Answer:" in answer:
  62. answer = answer.split("Helpful Answer:")[-1].strip()
  63. else:
  64. answer = answer.strip()
  65. # Return the model's response
  66. return answer
  67. # Initialize the language model
  68. init_llm()

To implement your chatbot, you need to run the server.py file, first.

  1. 1
  1. python3 server.py

Now click the following button to open your application:

A new window will open for your application.

Chatbot Application